original 24 May 2016 at 11:35

I2P: Accelerating Asymmetric Cryptography with Tables

3 min

8.2K

Asymmetric cryptography in I2P has always led to slowdowns: the Diffie-Hellman algorithm when establishing transport sessions and, in my opinion, the unfortunate choice of the ElGamal scheme in I2P addresses. This is especially noticeable when working on weak hardware and floodfills. The approach proposed in the article is based on the use of some features of I2P and allows you to achieve significant acceleration and reduce the load on the processor.

Asymmetric encryption in I2P

Currently, asymmetric encryption is based on the operation of exponentiation modulo and the hypothesis that discrete logarithms are practically impossible. There are plans to use other types of asymmetric encryption, but they have not yet been implemented.
The public key y is calculated based on the private key x using the formula y=g^x mod p, where p is a prime, and in the general case a triple of numbers (y, p, g) is passed as the key. In I2P, the numbers p and g are constants and must be the same for all implementations:
g = 2,
p = 2^2048 - 2^1984 - 1 + 2^64 * ([pi*2^1918] + 124476),
where pi denotes the number pi, and [] is the operation of taking the integer part of a number.
The length of p is 256 bytes, respectively, the length of the public key.

According to official documentation secret key length is 2048 bits for x64 and 226 bits for all others.

Calculated tables

The use of calculated tables was previously discussed for EdDSA signatures.
Let us briefly recall the essence. An addition operation is defined over the curve points and it is required, using this operation, to multiply the constant base point by a 32-byte number. To do this, we will add byte by byte, taking the result of multiplying a point by the value of each byte from the table, calculated once at startup. This will require exactly 32 additions, while the calculation time becomes constant, which is extremely important for multiplying with a secret key.

We will do the same for the operation of raising g to a power modulo p, only instead of the addition operation we will use the multiplication operation.
Let's calculate a table of all possible values of powers of p for each byte, filling in an array of 255 elements for each byte, multiplying in order by g=2.
You will notice that the first elements of the table do not need to be stored, since they are obtained by setting a bit in the corresponding position, however, the value should not exceed p, in fact this corresponds to 2048 = 11 bits, that is, only one and a third of the table bytes.

Thus, for a private key with a length of 226 bits, the table size is 29*255*255 ~ 2 megabytes, for a key with a length of 2048, the table size will be 256*255*255 ~ 16 megabytes, which can be a significant amount, but in this case the effect is obtained more significant.

Montgomery Multiplication

Since each exponentiation requires at least 29 multiplications modulo, it is advisable to use Montgomery multiplication.
Its meaning comes down to replacing one module with another, more convenient module for calculations. For practical needs, a power of two is chosen and then slow division is replaced by a fast bit shift.
The disadvantage is the need to convert to the Montgomery representation. However, in our case, this is done only once at the time of calculation of the table and the only additional action is to convert the result.
For practical purposes, we do not need to implement Montgomery multiplication - the corresponding functions are provided in OpenSSL.

int BN_MONT_CTX_set(BN_MONT_CTX *mont, const BIGNUM *m, BN_CTX *ctx);

specifies the module (in our case p)

int BN_mod_mul_montgomery(BIGNUM *r, BIGNUM *a, BIGNUM *b,  BN_MONT_CTX *mont, BN_CTX *ctx);

Montgomery multiplication itself in the context with a given modulus

int BN_from_montgomery(BIGNUM *r, BIGNUM *a, BN_MONT_CTX *mont, BN_CTX *ctx);

converting the result to normal representation.

Implementation in i2pd

Since release 2.7.0 supported and controlled by parameter precomputation.elgamal, by default it is disabled for x64 and enabled for others. It is recommended to enable it to work in floodfill mode if there are no strict restrictions on memory use. In this case, the processor load is reduced by more than 2 times due to the need to frequently establish connections with other nodes.
It is used for generating key pairs when establishing connections, asymmetric encryption when building tunnels, and “garlic” encryption between addresses. Similarly, DSA signing can be accelerated, but in I2P it is considered obsolete and is gradually being replaced by EdDSA.
The approach discussed in the article may seem trivial, but its implementation has shown its effectiveness, making it possible to work on those platforms that previously lacked performance.

Tags:

Hubs:

Comments 3

A UFO flew in and published this inscription here

original 25 May 2016 at 05:53

>I can say that optimizing low-level functions (addition, multiplication, division, modulus, etc.) correctly does not give a very large increase in speed

This is not optimization of low-level functions, but use of the fact that p is the same thing.

>I have a strong suspicion that at least some of those methods can be adapted for exponentiation.

Most likely yes. This is just the first step to speed up your work..

>Using tables means that the code becomes vulnerable to timing attacks

Quite the contrary, in this case the calculation time becomes constant, in contrast to a simple “double-and-add”, in which the signature time will depend on the number of one bits in the key.

>«Length p - 256 bytes" - a lot of something, probably bits.

A byte is 2048 bits. Hence all the brakes.

dmitryredkin 25 May 2016 at 16:17

Oh, I'll have to try it! Otherwise I had to turn off i2p on the router (atom 1.6GHz), because... it gave a constant load of 70% on each of the 2 cores. All options for improving performance have been tried and did not provide any improvement..

Only full-fledged users can leave comments. Sign in, Please.